knitr document van Steensel lab
TF reporter cDNA reads processing - Deep P53/GR scan - stimulation 1
Introduction
I previously processed the raw sequencing data, quantified the pDNA data and normalized the cDNA data. In this script, I want to have a detailed look at the cDNA data from a general perspective.
Analysis
First insights into data distribution - reporter activity distribution plots
Explain expression differences betweeen the different affinities
The site closest to the minimal promoter determines the activity. Can neighboring sites even inhibit this effect?
Ridge/Lasso regression
##
## Call: glm(formula = log_reporter_activity ~ affinity_pos1 + affinity_pos2 +
## affinity_pos3 + affinity_pos4, family = "gaussian", data = cDNA_df_p53)
##
## Coefficients:
## (Intercept) affinity_pos1_1_very-weak
## 0.74835 0.17093
## affinity_pos1_2_weak affinity_pos1_3_medium
## 0.14411 0.03793
## affinity_pos1_4_strong affinity_pos2_1_very-weak
## 0.19941 0.09549
## affinity_pos2_2_weak affinity_pos2_3_medium
## -0.14039 -0.17676
## affinity_pos2_4_strong affinity_pos3_1_very-weak
## -0.09254 0.42701
## affinity_pos3_2_weak affinity_pos3_3_medium
## 0.25897 -0.02712
## affinity_pos3_4_strong affinity_pos4_1_very-weak
## 0.10787 0.77533
## affinity_pos4_2_weak affinity_pos4_3_medium
## 0.98144 0.74495
## affinity_pos4_4_strong
## 0.85772
##
## Degrees of Freedom: 1025 Total (i.e. Null); 1009 Residual
## Null Deviance: 405.2
## Residual Deviance: 258.1 AIC: 1532
Random forest implementation
Session Info
## [1] "Run time: 1.188214 mins"
## [1] "/DATA/usr/m.trauernicht/projects/SuRE_deep_scan_trp53_gr/stimulation_1"
## [1] "Thu Nov 26 17:53:15 2020"
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] glmnetUtils_1.1.6 glmnet_4.0-2 Matrix_1.2-18
## [4] randomForest_4.6-14 plotly_4.9.2.1 ROCR_1.0-11
## [7] tidyr_1.0.0 stringr_1.4.0 readr_1.3.1
## [10] GGally_1.5.0 gridExtra_2.3 cowplot_1.0.0
## [13] plyr_1.8.6 viridis_0.5.1 viridisLite_0.3.0
## [16] ggforce_0.3.1 ggbeeswarm_0.6.0 ggpubr_0.2.5
## [19] magrittr_1.5 pheatmap_1.0.12 tibble_3.0.1
## [22] maditr_0.6.3 dplyr_0.8.5 ggplot2_3.3.0
## [25] RColorBrewer_1.1-2
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.1 jsonlite_1.7.1 splines_3.6.3 foreach_1.4.7
## [5] prettydoc_0.4.0 shiny_1.4.0 assertthat_0.2.1 vipor_0.4.5
## [9] yaml_2.2.1 pillar_1.4.3 lattice_0.20-38 glue_1.4.2
## [13] digest_0.6.27 promises_1.1.1 ggsignif_0.6.0 polyclip_1.10-0
## [17] colorspace_1.4-1 htmltools_0.5.0 httpuv_1.5.4 pkgconfig_2.0.3
## [21] xtable_1.8-4 purrr_0.3.3 scales_1.1.0 tweenr_1.0.1
## [25] later_1.1.0.1 mgcv_1.8-31 farver_2.0.1 ellipsis_0.3.0
## [29] withr_2.1.2 lazyeval_0.2.2 mime_0.9 survival_3.1-8
## [33] crayon_1.3.4 evaluate_0.14 nlme_3.1-143 MASS_7.3-51.5
## [37] beeswarm_0.2.3 tools_3.6.3 data.table_1.12.8 hms_0.5.3
## [41] lifecycle_0.2.0 munsell_0.5.0 compiler_3.6.3 rlang_0.4.8
## [45] grid_3.6.3 iterators_1.0.12 htmlwidgets_1.5.2 crosstalk_1.0.0
## [49] labeling_0.3 rmarkdown_2.5 gtable_0.3.0 codetools_0.2-16
## [53] reshape_0.8.8 R6_2.5.0 knitr_1.30 fastmap_1.0.1
## [57] shape_1.4.4 stringi_1.5.3 parallel_3.6.3 Rcpp_1.0.5
## [61] vctrs_0.2.4 tidyselect_1.1.0 xfun_0.19